keywords:"HPC" - Search Results - Digital Repository

guest :: login Digital Repository
		Search		Submit		Help		About

Home > Search Results: keywords:"HPC"

Search:

Search Tips :: Advanced Search

Search collections:

Sort by:	Display results:	Output format:

	Optimization of Run Configurations of k-Wave Jobs Sasák, Tomáš ; Jaroš, Marta (referee) ; Jaroš, Jiří (advisor) This thesis focuses on scheduling, i.e. correct approximation of configurations used to run k-Wave simulations on supercomputers from the IT4Innovations infrastructure. Especially, for clusters Salomon and Anselm. A single work is composed of a set which contains many simulations. Every simulation is executed by some code from the k-Wave toolbox. To calculate the simulation, it is necesarry to select a suitable configuration, which means the amount of supercomputer resources (number of nodes, i.e. cores), and the duration of the rental. Creation of an ideal configuration is complicated and is even harder for an inexperienced user. The approximation is made based on the empiric data, obtained from multiple executions of different sets of simulations on given clusters. This data is stored and used by a set of approximators, which performs the actual approximation by methods of interpolation and regression. The text describes the implementation of the final scheduler. By experimenting, the most efficient methods for this problem has found out to be Akima spline, PCHIP interpolation and cubic spline. The main contribution of this work is creation of a tool which can find suitable configuration for k-Wave simulation without knowing the code or having lots of experience with its usage. Detailed record
	Simulation of Ultrasound Propagation in Bones Kadlubiak, Kristián ; Vaverka, Filip (referee) ; Jaroš, Jiří (advisor) It is estimated that mind-boggling 14.1 million new cases of cancer occurred worldwide in 2012 alone. This number is alarming. Although healthy lifestyle may reduce a risk of developing cancer, there is always some probability that cancer would develop even in an absolutely fit individual. There are two main conditions for successful treatment of cancer. Firstly, early diagnostic is absolutely crucial. Secondly, there is a need for suitable surgical methods for affected tissue removal. Ultrasound has a great potential to be used for both purposes as a non-invasive method. Photoacoustic spectroscopy is imaging method for tumor detection of great properties making the use of ultrasound while High-Intensity Focused Ultrasound (HIFU) is non-invasive surgical method. These methods would be impossible without precise ultrasound propagation simulations. The k-Wave is an open source MATLAB toolbox implementing such simulations. So, why are not these methods already deployed in treatment? Unfortunately, the simulation of ultrasound propagation is a very time consuming task, which makes it ineffective for medical purposes. However, there are a few options how to accelerate these simulations. The use of GPU is a very promising way to accelerate simulation. The main topic of this thesis is the acceleration of the simulation of soundwaves propagation in bones and hard tissue. The implementation developed as a part of this thesis was benchmarked on various supercomputers including Anselm in Ostrava and Piz Daint in Lugano. The implemented solution provides remarkable acceleration compared to the original MATLAB prototype. It was able to accelerate the simulation around 160 times in the best case. It means that the simulation, which would otherwise last for 6.5 days, can be now computed in one hour. This acceleration was achieved using an NVIDIA Tesla P100 to run the simulation with the domain size of 416x416x416 grid points. The thesis includes performance benchmarks on different GPUs to provide complex image acceleration capabilities of developed implementation and provides discussion about memory usage and numerical accuracy. Thanks to the implemented solution harnessing the power of modern GPUs, doctors and researchers all around the world have a powerful tool in hands. Detailed record
	Non-Blocking Input/Output for the k-Wave Toolbox Kondula, Václav ; Vaverka, Filip (referee) ; Jaroš, Jiří (advisor) This thesis deals with an implementation of non-blocking I/O interface for the k-Wave project, which is designed for time-domain simulation of ultrasound propagation. Main focus is on large domain simulations that, due to high computing power requirements, must run on supercomputers and produce tens of GB of data in a single simulation step. In this thesis, I have designed and implemented a non-blocking interface for storing data using dedicated threads, which allows to overlap simulation calculations with disk operations in order to speed up the simulation. An acceleration of up to 33% was achieved compared to the current implementation of project k-Wave, which resulted, among other things, also to reduce cost of the simulation. Detailed record
	Installation and configuration of Octave computation cluster Mikulka, Zdeněk ; Hasmanda, Martin (referee) ; Sysel, Petr (advisor) This diploma thesis contains detailed design of high-performance cluster, primarely focused for parallel computing in Octave application. Each of component of this cluster is described along with instructions for installation and configuration. Cluster is based on GNU/Linux operating system and Message Parsing Interface. Design alllows implementation of this cluster in computers of schoolroom with active lessons. Detailed record
	Particle Swarm Optimization on GPUs Záň, Drahoslav ; Petrlík, Jiří (referee) ; Jaroš, Jiří (advisor) This thesis deals with a population based stochastic optimization technique PSO (Particle Swarm Optimization) and its acceleration. This simple, but very effective technique is designed for solving difficult multidimensional problems in a wide range of applications. The aim of this work is to develop a parallel implementation of this algorithm with an emphasis on acceleration of finding a solution. For this purpose, a graphics card (GPU) providing massive performance was chosen. To evaluate the benefits of the proposed implementation, a CPU and GPU implementation were created for solving a problem derived from the known NP-hard Knapsack problem. The GPU application shows 5 times average and almost 10 times the maximum speedup of computation compared to an optimized CPU application, which it is based on. Detailed record
	Development and Programming of Low Power Cluster Hradecký, Michal ; Nikl, Vojtěch (referee) ; Jaroš, Jiří (advisor) This thesis deals with the building and programming of a low power cluster composed of Hardkernel Odroid XU4 kits based on ARM Cortex A15 and Cortex A7 chips. The goal was to design a simple cluster composed of multiple kits and run a set of benchmarks to analyze performance and power consumption. The test set consisted of HPL and Stream benchmarks and various tests for the MPI interface. The overall performance of the cluster composed of four kits in HPL benchmark was measured 23~GFLOP/s in double-precision. During this test, the cluster showed power efficiency about 0.58~GFLOP/W. The work also describes the installation of PBS Torque scheduler and HPC software build and installation framework EasyBuild on 32-bit ARM platform. The comparison with Anselm supercomputer showed that Odroid cluster is as effiecient as large supercomputer but with slightly higher price. Detailed record
	Efficient Communication in Multi-GPU Systems Špeťko, Matej ; Jaroš, Jiří (referee) ; Vaverka, Filip (advisor) After the introduction of CUDA by Nvidia, the GPUs became devices capable of accelerating any general purpose computation. GPUs are designed as parallel processors which posses huge computation power. Modern supercomputers are often equipped with GPU accelerators. Sometimes single GPU performance is not enough for a scientific application and it needs to scale over multiple GPUs. During the computation, there is a need for the GPUs to exchange partial results. This communication represents computation overhead and it is important to research methods of the effective communication between GPUs. This means less CPU involvement, lower latency and shared system buffers. This thesis is focused on inter-node and intra-node GPU-to-GPU communication using GPUDirect technologies from Nvidia and CUDA-Aware MPI. Subsequently, k-Wave toolbox for simulating the propagation of acoustic waves is introduced. This application is accelerated by using CUDA-Aware MPI. Peer-to-peer transfer support is also integrated to k-Wave using CUDA Inter-process Communication. Detailed record
	GridEngine Reporting Tool Rožek, František ; Chalupníček, Kamil (referee) ; Kašpárek, Tomáš (advisor) The aim of this work is to build a tool that will reflect the utilization of the computing cluster, built on Grid Engine technology. Data are processed using PHP and Shell scripts and then stored in MySQL, or RRD databases. The work created a system that handles huge amounts of data and provides a comprehensive view on the utilization of the entire cluster, but also its specific components, or statistics of individual users. Created solution provides current and long-term data. The result of this work allows you to watch computing cluster from a single tool, which was not possible before. Detailed record
	Analysis of Operational Data and Detection od Anomalies during Supercomputer Job Execution Stehlík, Petr ; Nikl, Vojtěch (referee) ; Jaroš, Jiří (advisor) V posledních letech jsou superpočítače stále větší a složitější, s čímž souvisí problém využití plného potenciálu systému. Tento problém se umocňuje díky nedostatku nástrojů pro monitorování, které jsou specificky přizpůsobeny uživatelům těchto systémů. Cílem práce je vytvořit nástroj, nazvaný Examon Web, pro analýzu a vizualizaci provozních dat superpočítače a provést nad těmito daty hloubkovou analýzu pomocí neurálních sítí. Ty určí, zda daná úloha běžela korektně, či vykazovala známky podezřelého a nežádoucího chování jako je nezarovnaný přístup do operační paměti nebo např. nízké využití alokovaých zdrojů. O těchto faktech je uživatel informován pomocí GUI. Examon Web je postavený na frameworku Examon, který sbírá a procesuje metrická data ze superpočítače a následně je ukládá do databáze KairosDB. Implementace zahrnuje disciplíny od návrhu a implementace GUI, přes datovou analýzu, těžení dat a neurální sítě až po implementaci rozhraní na serverové straně. Examon Web je zaměřen zejména na uživatele, ale může být také využíván administrátory. GUI je vytvořeno ve frameworku Angular s knihovnami Dygraphs a Bootstrap. Uživatel díky tomu může analyzovat časové řady různých metrik své úlohy a stejně jako administrátor se může informovat o současném stavu superpočítače. Tento stav je zobrazen jako několik globálně agregovaných metrik v posledních 30 minutách nebo jako 3D model (či 2D model) superpočítače, který získává data ze samotných uzlů pomocí protokolu MQTT. Pro kontinuální získávání dat bylo využito rozhraní WebSocket s vlastním mechanismem přihlašování a odhlašování konkretních metrik zobrazovaných v modelu. Při analýze spuštěné úlohy má uživatel dostupné tři různé pohledy na danou úlohu. První nabízí celkový přehled o úloze a informuje o využitých zdrojích, času běhu a vytížení části superpočítače, kterou úloha využila společně s informací z neurálních sítí o podezřelosti úlohy. Další dva pohledy zobrazují metriky z výkonnostiního energetického hlediska. Pro naučení neurálních sítí bylo potřeba vytvořit novou datovou sadu ze superpočítače Galileo. Tato sada obsahuje přes 1100 úloh monitorovaných na tomto superpočítači z čehož 500 úloh bylo ručně anotováno a následně použito pro trénování sítí. Neurální sítě využívají model back-propagation, vhodný pro anotování časových sérií fixní délky. Celkem bylo vytvořeno 12 sítí pro metriky zahrnující vytížení procesoru, paměti a dalších části a např. také podíl celkového času procesoru v úsporném režimu C6. Tyto sítě jsou na sobě nezávislé a po experimentech jejich finální konfigurace 80-20-4-3-1 (80 vstupních až 1 výstupní neuron) podávaly nejlepší výsledky. Poslední síť (v konfiguraci 12-4-3-1) anotovala výsledky předešlých sítí. Celková úspěšnost systému klasifikace do 2 tříd je 84 %, což je na použitý model velmi dobré. Výstupem této práce jsou dva produkty. Prvním je uživatelské rozhraní a jeho serverová část Examon Web, která jakožto rozšiřující vrstva systému Examon pomůže s rozšířením daného systému mezi další uživatele či přímo další superpočítačová centra. Druhým výstupem je částečně anotovaná datová sada, která může pomoci dalším lidem v jejich výzkumu a je výsledkem spolupráce VUT, UNIBO a CINECA. Oba výstupy budou zveřejněny s otevřenými zdrojovými kódy. Examon Web byl prezentován na konferenci 1st Users' Conference v Ostravě pořádanou IT4Innovations. Další rozšíření práce může být anotace datové sady a také rozšíření Examon Web o rozhodovací stromy, které určí přesný důvod špatného chování dané úlohy. Detailed record
	Efficient Implementation of High Performance Algorithms on Intel Xeon Phi Šimek, Dominik ; Hrbáček, Radek (referee) ; Jaroš, Jiří (advisor) This thesis is dedicated to the implementation of high performance algorithms on the Intel Xeon Phi coprocessor. The Xeon phi was introduced by Intel as a new MIC (Many Integrated Core) architecture in 2012. The theoretical part of the thesis is focused on the architecture of the coprocessor (with peak performance of 2 tFLOPS for a single precision data) and on the procedure of algorithms implementation and optimization. The theoretical knowledge is then applied to a practical examples with demonstration of the implementation and the optimization of algorithms and work with the coprocessor. In the practical part of the thesis, simple benchmarks such as a vector matrix multiplication and a matrix multiplication are explained and implemented. In the first benchmark 6.5% of theoretical coprocessor performance was achieved, in the second it was much more. In following chapter a more complex benchmark - simulation of a particles system (N-Body), that reached more than 35% of coprocessor performance (725 gFLOPS), is discussed. The following section is dedicated to some interesting problems such as optimization of a MATLAB module k-Wave (propagation of the ultrasound waves), extraction of I-vector (speech processing), cross-compilation of existing libraries, modules and programs. In the conclusion of the thesis the usage the potential of the Intel Xeon Phi is evaluated. Detailed record

Interested in being notified about new results for this query?
Subscribe to the RSS feed.

Digital Repository :: :: :: ::
Powered by v1.1.2
Maintained by

This site is also available in the following languages:
Česky English